A Component Architecture for Dynamically Managing Privacy Constraints in Personalized Web-Based Systems
نویسنده
چکیده
User-adaptive (or “personalized”) systems on the web cater their interaction to each individual user and provide considerable benefits to both users and web vendors. These systems pose privacy problems, however, since they must collect large amounts of personal information to be able to adapt to users, and often do this in a rather inconspicuous manner. The interaction with personalized systems is therefore likely to be affected by users' privacy concerns, and in many cases also subject to privacy laws and self-regulatory privacy principles. An analysis of nearly 30 international privacy laws revealed that many of them impose severe restrictions not only on the data that may be collected but also on the personalization methods that may be employed. For many personalization goals, more than one methods can be used that differ in their data and privacy requirements and their anticipated accuracy and reliability. This paper presents a software architecture that encapsulates the different personalization methods in individual components and, at any point during runtime, ascertains the dynamic selection of that component that is permissible under the currently prevailing privacy constraints and also has the optimal anticipated personalization effects. 1 Personalized systems on the web: benefits and methods User-adaptive (or personalized) computer systems take individual characteristics of their current users into account and adapt their behavior accordingly. Such systems have already been deployed in several areas, including education and training (e.g., [1]), online help for complex PC software (e.g., [2, 3]), dynamic information delivery (e.g., [4]), provision of computer access to people with disabilities (e.g., [5, 6]), and to some extent information retrieval. In several of these cases, benefits for users could be empirically demonstrated. Since about 1998, personalization technology is being deployed to the World Wide Web where it is mostly used for customer relationship management. The aim is to provide value to customers by serving them as individuals and by offering them a unique personal relationship with the business (the terms micro marketing and one-toone marketing are being used to describe this business model [7, 8]). Current personalization on the web is still relatively simple. Examples include customized content (e.g., personalized finance pages or news collections), customized recommendations or advertisements based on past purchase behavior, customized (preferred) pricing, tailored email alerts, and express transactions [9]. Personalization that is likely to be found on the web in the future includes, e.g., product descriptions whose complexity is geared towards the presumed level of user expertise; tailored presentations that take users’ preferences regarding product presentation and media types (text, graphics, video) into account; recommendations that are based on recognized interests and goals of the user; and information and recommendations by portable devices that consider the user’s location and habits. A number of studies indicate that users seem to find personalization on the web useful [10, 11], and that they stay longer at personalized websites and visit more pages [12]. Other research demonstrates that personalization also benefits web vendors with respect to the conversion of visitors into buyers [13], “cross-selling” [14], and customer retention and development [15, 16]. Personalized systems utilize numerous techniques for making assumptions about users, such as domain-based inference rules, stereotype techniques, machine learning techniques (e.g content-based filtering, and clique-based or “collaborative” filtering), plan recognition methods, logic-based reasoning, Bayesian inferences, and many more (see [17] for a recent survey). These techniques have different requirements regarding the data that must be available. For instance, most machine learning techniques assume that a large number of raw data (such as a user’s clickstream data) is available and that all learning is performed at one time. Since single user sessions are often too short to deliver sufficient data, these techniques are therefore typically applied to the data from several sessions with a given user. In contrast, incremental techniques can learn in several steps, taking the new raw data of the current session and the previous learning results into account. 2 Privacy problems caused by personalized systems Personalized systems generally operate in a data-driven manner: more personalization can be performed the more data about the user is available, and personalization based on more data will also tend to be more accurate and more individualized. Useradaptive systems therefore collect considerable amounts of personal data and “lay them in stock” for possible future usage. Moreover, the collection of information about users is often performed in a relatively inconspicuous manner (such as by watching their web navigation behaviors). Personalized systems are therefore most certainly affected by the privacy concerns that a majority of today’s Internet users articulates, by privacy laws that are in place, and by company and sector privacy policies. 2.1 Users’ privacy concerns Numerous consumer surveys have been conducted so far that consistently reveal widespread privacy concerns among today’s Internet users.1 Respondents reported being (very) concerned about, e.g., threats to their privacy when using the Internet (81 87%), about divulging personal information online (67 74%), and about being tracked online (54 77%). They indicated leaving web sites that required registration information (41%) having entered fake registration information (24 40%), and having refrained from shopping online due to privacy concerns or having bought less (24 32%). An analysis of results from thirty surveys with a focus on web personalization is given in [18]. Hardly any survey data exists on whether Internet users will agree with the usage of their personal data for personalized interaction. In a poll by an industry advocacy group for web personalization [11], 51% of the respondents indicated to be willing to give out information about themselves in order to receive an “online experience truly personalized for them” (the subjects of this study were however recruited from a “permission-based opt-in list” which may have biased the sample). It seems prudent to assume that the general Internet privacy concerns that were documented by the mentioned consumer surveys also apply to the usage of personal data for web personalization purposes. Caution must be exercised however since users who claim having privacy concerns do not necessarily exhibit a more privacy-minded interaction with web sites, as was demonstrated in experiments by [19]. 2.2 Privacy laws and self-regulatory privacy principles Privacy laws protect the data of identified or identifiable individuals. For privacy laws to be applicable, it is thus not required that the system actually identifies the user, but only that it is possible to identify the user with reasonable efforts based on the data that the system collects. The latter situation often applies to personalized systems. The privacy laws of many countries not only regulate the processing of personal data in the national territory, but also restrict the trans-border flow of personal data, or even extend their scope beyond the national boundaries. Such laws then also affect personalized web sites abroad that serve users in these regulated countries, even when there is no privacy law in place in the jurisdictions in which these sites are located. We collected nearly 30 international privacy laws and categorized them by criteria that affect the design of personalized systems the most [20]. Categories include registration duties, record-keeping duties, reporting duties, disclosure duties at the website, duty to respect certain user requests, duty to respect user vetoes ("opt out"), duty to ask for user permission ("opt in"), exceptions for very sensitive data, restrictions on data transfer abroad, restrictions on foreign sites collecting data inland, archiving/destruction of personal data, and “other” impacts on personalization. We found that if privacy laws apply to a personalized website, they often not only affect 1 Links to most surveys that are available online can be found at http://www.privacyexchange.org/iss/surveys/surveys.html. the conditions under which personal data may be collected and the rights that data subjects have with respect to their data, but also the methods that may be used for processing them. Below is a sample of several legal restrictions that substantially affect the internal operation of personalized hypermedia applications (more constraints will be discussed in the application example). Usage logs must be deleted after each session, except for billing purposes and certain record-keeping and fraud-related debt recovery purposes [21]. This provision affects, e.g., the above-mentioned machine learning methods that can be employed in a personalized hypermedia system. If learning takes place over several sessions, only incremental methods can be employed since the raw usage data from previous sessions have all to be discarded. Usage logs of different services may not be combined, except for accounting purposes [21]. This is a severe restriction for so-called central user modeling servers that collect user data from, and make them available to, different useradaptive applications [22]. User profiles are permissible only if pseudonyms are used. Profiles retrievable under pseudonyms may not be combined with data relating to the bearer of the pseudonym [21]. This clause mandates a Chinese wall between the component that receives data from identifiable users, and the user modeling component which makes generalizations about pseudonymous users and adapts hypermedia pages accordingly. No fully automated individual decisions are allowed that produce legal effects concerning the data subject or significantly affect him, and which are based solely on automated processing of data intended to evaluate certain personal aspects relating to him, such as his performance at work, creditworthiness, reliability, conduct, etc [23]. This prohibition has impacts on learner-adaptive hypermedia systems for tutoring [24]. E.g., if such systems assign formal grades, there has to be a human in the loop somewhere. Anonymous or pseudonymous access and payment must be offered if technically possible and reasonable [21, 25]. Strong encryption software is regulated in France [26], which may have impacts on the use of encryption to protect user data in transit when a personalized website or the user is located in France. In addition to legislative regulations, privacy practices of personalized web sites are also restricted by self-regulatory privacy principles, such as company-specific privacy policies or sector-specific principles (e.g., [27]). These principles can also severely impact the permissibility of personalization methods. 3 Privacy management 3.1 Pseudonymous and identified interaction Two principled solutions are possible to cater to privacy requirements in personalized systems. One direction is to allow users to remain anonymous with regard to the personalized system (and possibly even the whole network infrastructure) whilst enabling it to still link the same user across different sessions, so that it can cater to her individually. In [28, 29], we present a reference model for pseudonymous interaction between users and web-based applications in which full personalization can nevertheless take place.2 Pseudonymous interaction seems to be appreciated by users, even though only a single user poll addressed this question explicitly so far [30]. One can expect that anonymity will encourage users to be more open when interacting with a personalized system, thus facilitating and improving the adaptation to this user. The fact that in most cases privacy laws do not apply any more when interaction is anonymous also relieves the application provider from restrictions and duties imposed by such laws. Finally, anonymous and pseudonymous interaction are sometimes even legally mandated if they can be realized with reasonable effort [21, 25]. Anonymous and pseudonymous interaction also has several drawbacks though: it requires an elaborate anonymity infrastructure, it is currently difficult to preserve when payments, physical goods and non-electronic services are being exchanged, and it harbors the risk of misuse. Anonymous personalization is also restricted to electronic channels only. Pseudonymous data cannot be used for cross-channel communication (sending a brochure to a web customer by mail) and cross-channel recognition (recognizing a web customer in a brick and mortar store). These drawbacks become increasingly important since the number of web-only merchants is constantly shrinking. In the second principled approach to rejoining personalization and privacy, the user would not remain anonymous. Privacy issues are taken into account by respecting privacy laws, self-regulatory privacy principles, and/or users’ privacy preferences. This paper deals exclusively with this second approach. It is specifically concerned with architectural issues of privacy management in personalized systems, i.e. software architectures and processes that allow a personalized system to dynamically cater to user’s privacy wishes and to regulatory constraints. 3.2 Current work on privacy management Current work in privacy management is mostly concerned with expressing privacy constraints for data, relating these constraints to software and business processes, and enforcing privacy constraints automatically. [31] introduces privacy meta-data tables which indicate the external recipients and the retention period, for each usage purpose and for each piece of information (attribute) collected for that purpose. A second meta-table specifies access permissions. Processes like the Privacy Constraint Validator, the Attribute Access Control and the Data Retention Manager check the compliance with privacy preferences and privacy policies. IBM’s Enterprise Privacy Architecture [32, 33] maps customer preferences and 2 This model also protects the anonymity of the central user modeling server that contains the user’s data since knowledge about its location may reveal the identity of the user, e.g. when it is sitting on the user’s local network. data against business processes, privacy rules, technology and the enterprise architecture as a whole, and thereby provides a mechanism for analyzing business processes in a privacy context. A “technical reference model” helps guarantee privacy at the transactional level, where the enterprise collects and uses personal information. This model relies on object, data and rules models to build applications that support and enhance privacy and collectively determine what privacy-relevant data is accumulated and how it must be handled. An authorization director evaluates the given policies and decides whether or not access requests to data sources are granted. [34] focuses on the formulation of enterprise-independent privacy policies in the EP3P Privacy Policy Language to express and enforce restrictions to the access of personal data in legacy systems. In a similar vein, [35] study the more expressive logic-based Authorization Specification Language. [36] presents a formal security model based on state machines, to enforce legal privacy requirements (such as purpose binding or necessity of data processing). The model is based on the integrity concepts of well-formed transactions and separation of duty. This work complements existing approaches in that it focuses on how a personalized system can dynamically adjust to the currently prevailing privacy constraints. Numerous stipulations in privacy laws and most likely also user privacy concerns influence personalization methods in very different ways. A global preformulated policy for the selection of personalization methods under each different combination of impact factors does not seem feasible. Instead, a set of personalization methods must be dynamically selected at runtime, taking the current privacy constraints into account as well as the general premise to maximize personalization benefits. In the remainder of this paper, we will discuss our architecture that utilizes functionally related software components for this purpose. 4 Redundant-component architectures
منابع مشابه
Constraint-Sensitive Privacy Management for Personalized Web-Based Systems
This research aims at reconciling web personalization with privacy constraints imposed by legal restrictions and by users’ privacy preferences. We propose a software product line architecture approach, where our privacyenabling user modeling architecture can dynamically select personalization methods that satisfy current privacy constraints to provide personalization services. A feasibility stu...
متن کاملRespecting Users' Individual Privacy Constraints in Web Personalization
Web personalization has demonstrated to be advantageous for both online customers and vendors. However, its benefits may be severely counteracted by privacy constraints. Personalized systems need to take users’ privacy concerns into account, as well as privacy laws and industry self-regulation that may be in effect. In this paper, we first discuss how these constraints may affect web-based pers...
متن کاملsubmitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY
OF THE DISSERTATION A Framework for Privacy-Enhanced Personalization By Yang Wang Doctor of Philosophy in Information and Computer Science University of California, Irvine, 2010 Professor Alfred Kobsa, Chair Web personalization has demonstrated to be advantageous for both online customers and vendors. However, its benefits may be severely counteracted by privacy constraints. Personalized system...
متن کاملA centralized privacy-preserving framework for online social networks
There are some critical privacy concerns in the current online social networks (OSNs). Users' information is disclosed to different entities that they were not supposed to access. Furthermore, the notion of friendship is inadequate in OSNs since the degree of social relationships between users dynamically changes over the time. Additionally, users may define similar privacy settings for their f...
متن کاملDistributed system for the support of mobile Web-based services1
Personalized services are a key feature for the success of the next generation Web that is accessed by heterogeneous and mobile client devices. The need to provide high performance and to preserve user data privacy opens a novel dimension in the design of infrastructures and request dispatching algorithms to support personalized services for the Mobile Web. Performance issues are typically addr...
متن کامل